Introduction

For the aspiring or current IT professionals, this project will uncover salary insights on a wide variety of IT professions. The goal is to provide helpful decisions as to what profession to go or transition into, which programming language to know, which industry to go into, what size company to work in, and what German city to live in to make an average to above-average salary.

To explore this relationship, I will be exploring a survey dataset about German IT professionals. I will be taking a close look into what the higher paid professionals have in terms of programming knowledge, the company type and size they work at, the number of years of experience they have, what seniority level they are at, and which city they live in. The goal is to bring to light new information that isn’t clear at first glance and use it to help others make life-changing decisions.

Hence with my approach, I hope that my fellow aspiring or current IT professionals have a better understanding of what profession to go into or transition into.

Tools Required

library(knitr) # used for dynamic report generation
library(rmarkdown) # document conversion
library(ggplot2) #used for data visualization
library(plotly) #used for interactive plots
library(dplyr) #used for data manipulation
library(readr) #used to read rectangular data
library(formattable) #used for formatted tables

Data Import and Preparation

Source of the data

IT Salary Survey for EU Region(2018-2020) (https://www.kaggle.com/datasets/parulpandey/2020-it-salary-survey-for-eu-region?resource=download) I sourced this data from the Kaggle public datasets. The survey was made and has been conducted by Sergey Vasilyev since 2015. The purpose of this data was to help discover salary patterns among the IT professionals in the EU region.

Age Gender City Position Total years of experience Seniority level Your main technology / programming language Base_Salary Сontract duration Main language at work Company size Company type
26 Male Munich Software Engineer 5 Senior TYPESCRIPT 80000 Unlimited contract English 51-100 Product
26 Male Berlin Backend Developer 7 Senior RUBY 80000 Unlimited contract English 101-1000 Product
29 Male Berlin Software Engineer 12 Lead JAVASCRIPT 120000 Temporary contract English 101-1000 Product
28 Male Berlin Frontend Developer 4 Junior JAVASCRIPT 54000 Unlimited contract English 51-100 Startup
37 Male Berlin Backend Developer 17 Senior C 62000 Unlimited contract English 101-1000 Product
32 Male Berlin DevOps 5 Senior AWS 76000 Unlimited contract English 11-50 Startup

Variable names and definition: Age- age of surveyee, Gender- gender of surveyee, City- city the surveyee works in, Position- position the surveyee works as, Total years of experience- number of years of experience the surveyee has, Seniority level- surveyee’s seniority level, Base_Salary- surveyee’s base salary, Your main technology/programming language- the surveyee’s main programming language or technology used at work, Contract duration- surveyee’s period through which their contract is effective, Company size- number of people that work at the surveyee’s job, Company type- the industry the surveyee works in.

Exploratory Data Analysis And Visualization

I added three new variable: number_responses_per_city- the total amount of responses per city, number_people_per_position- the total number of people who work the same position, number_people_per_main_tech- the total number of people who use the same technology/programming language at work.

Age Gender City Position Total years of experience Seniority level Your main technology / programming language Base_Salary Сontract duration Main language at work Company size Company type number_responses_per_city number_people_per_position number_people_per_main_tech
26 Male Munich Software Engineer 5 Senior TYPESCRIPT 80000 Unlimited contract English 51-100 Product 236 388 31
26 Male Berlin Backend Developer 7 Senior RUBY 80000 Unlimited contract English 101-1000 Product 681 174 23
29 Male Berlin Software Engineer 12 Lead JAVASCRIPT 120000 Temporary contract English 101-1000 Product 681 388 116
28 Male Berlin Frontend Developer 4 Junior JAVASCRIPT 54000 Unlimited contract English 51-100 Startup 681 89 116
37 Male Berlin Backend Developer 17 Senior C 62000 Unlimited contract English 101-1000 Product 681 174 98
32 Male Berlin DevOps 5 Senior AWS 76000 Unlimited contract English 11-50 Startup 681 57 6

Upon exploring the dataset, I noticed that about half of the cities from which there were surveyee’s only had one response, some technologies/programming languages had very few people using them, two people said that they made an unrealistic amount of money, a few people didn’t give their gender, and two people put that their gender was diverse. In order to have an accurate representation of the majority of the surveyee’s, I filtered down the data to people who made less than a million euros, whose gender was male or female, the technologies/programming languages that had more than nine people using them, and the cities that had more than nine responses. This boiled down to only 949 responses of the original 1253.

Sex <- c("Male", "Female")
Salaries2 <- Salaries %>% filter(number_responses_per_city >= 10 & Base_Salary < 1000000 & Gender %in% Sex & number_people_per_main_tech > 9)
headsal2 <- head(Salaries2)
formattable(headsal2)
Age Gender City Position Total years of experience Seniority level Your main technology / programming language Base_Salary Сontract duration Main language at work Company size Company type number_responses_per_city number_people_per_position number_people_per_main_tech
26 Male Munich Software Engineer 5 Senior TYPESCRIPT 80000 Unlimited contract English 51-100 Product 236 388 31
26 Male Berlin Backend Developer 7 Senior RUBY 80000 Unlimited contract English 101-1000 Product 681 174 23
29 Male Berlin Software Engineer 12 Lead JAVASCRIPT 120000 Temporary contract English 101-1000 Product 681 388 116
28 Male Berlin Frontend Developer 4 Junior JAVASCRIPT 54000 Unlimited contract English 51-100 Startup 681 89 116
37 Male Berlin Backend Developer 17 Senior C 62000 Unlimited contract English 101-1000 Product 681 174 98
37 Male Berlin Frontend Developer 6 Middle JAVASCRIPT 57000 Unlimited contract English 11-50 Product 681 89 116

Age Distributions

In order to justify the cutting down of responses, I looked at the mean and median age of both the original and filtered down datasets.

The original datasets mean age is 32.51 while the median age is 32. The filtered down datasets mean age is 32.5 while the median age is 32. The mean and median ages for both datasets didn’t shift very much.

Gender Counts

In order to justify the cutting down of responses, I looked at the number of surveyee’s by gender of both the original and filtered down datasets.

Gender n
Male 1049
Female 192
Diverse 2
Gender n
Male 808
Female 141

About 77% of males and 73% females are left. Although a larger amount of males were cut out, but this is still an accurate representation of the original data because the proportion of males to females only slightly changed after the filtering.

City Counts

In order to justify the cutting down of responses, I looked at the number of surveyee’s per city of both the original and filtered down datasets.

City n
Berlin 681
Munich 236
Frankfurt 44
Hamburg 40
Stuttgart 33
Cologne 20
Düsseldorf 15
Amsterdam 9
Karlsruhe 7
Nürnberg 7
City n
Berlin 621
Munich 203
Frankfurt 37
Hamburg 36
Stuttgart 26
Cologne 16
Düsseldorf 10

After comparing the number of responses for the cities with the most responses, the number of responses per city kept were about 66% and up.

Position Counts

In order to justify the cutting down of responses, I looked at the number of surveyee’s per position of both the original and filtered down datasets.

Position n
Software Engineer 388
Backend Developer 174
Data Scientist 110
Frontend Developer 89
QA Engineer 71
DevOps 57
Mobile Developer 53
ML Engineer 42
Product Manager 39
Data Engineer 28
Position n
Software Engineer 314
Backend Developer 148
Frontend Developer 73
Data Scientist 71
QA Engineer 54
Mobile Developer 41
DevOps 36
Product Manager 33
ML Engineer 28
Data Engineer 23

After the filtering, we still have a good variety of responses per position. Our top ten positions remained the same between both datasets with some mild ordering change.

Experience Distribution

The original datasets mean number of years of experience is 8.76 years while the median is 8 years. The filtered down datasets mean number of years of experience is 8.86 years while the median is 8 years. The mean and median years of experience for both datasets didn’t shift very much and shows that we can move further with the new data.

Seniority Counts

Seniority level n
Senior 565
Middle 366
Lead 166
Junior 79
Head 44
Entry level 4
Seniority level n
Senior 444
Middle 265
Lead 135
Junior 47
Head 38
Principal 3

Looking at the number of surveyee’s per seniority level in both datasets, we do see a big difference, but every position has kept at least 60% of their responses after the filtering. The top 5 seniority levels remained constant in both datasets.

Main Language Counts

Main language at work n
English 1024
German 186
Russian 15
Italian 3
Spanish 3
Main language at work n
English 812
German 123
Russian 4
Deuglisch 1

After comparing the number of surveyee’s per main language used at work, English remained the most common language in the workplace. German and Russian came in second and third, respectively.

Main Tech Counts

Your main technology / programming language n
PYTHON 228
JAVA 216
JAVASCRIPT 116
C 98
PHP 73
GO 32
TYPESCRIPT 31
SWIFT 30
SCALA 28
KOTLIN 27
Your main technology / programming language n
JAVA 194
PYTHON 173
JAVASCRIPT 103
C 69
PHP 64
GO 28
SCALA 27
TYPESCRIPT 26
SWIFT 25
KOTLIN 23

Comparing the two datasets for the main technology/programming language used at work, our top ten programming languages remained the same even though the ordering changed. Python was originally the main used language, but in the filtered down dataset we have Java as the top one.

Salary Distributions

The mean salary in the original dataset is 71,655 euros while the median salary is 70,000 euros. The mean salary in the filtered dataset is 73,557 euros while the median salary is 70,000 euros. Both, the mean and median salaries, between the two datasets do not differ by much if by anything at all.

Experience Vs Salary

Gender

Which gender gets paid more? It turns out that males get paid more in general.

City

What city has a higher salary range? It looks like the highest paying cities are Berlin and Munich.

Technology/Programming Language

What technology/programming language should you learn? In order to receive a higher base salary, it looks like a person needs to know Python, Java, C, or Go.

Position

What position should I am to work as? It turns out that the highest paying positions are data scientist, software engineer, and backend developer.

Seniority Level

What seniority level should you strive for? In order to receive a higher base salary, a person should strive to work at some kind of management level such as a head, a lead, or a senior.

Company Size

What size company should you work at? It turns out that the company size a person should work at to receive a higher salary doesn’t really matter. Although, in general, someone should work at a company with 50+ people.

Company Type

What type of company should you work at? In order to receive a higher base salary, a person should try to work at a start up or a product company.

Summary

This analysis is intended to help aspiring or current IT professionals make a decision as to what profession to go or transition into, what tools are needed, what city to live in, what company size to work for, and what kind of company to work for. Hopefully, through data visualization, people can decide what profession to pursue.

Insights It turns out that in order to make a decent living in Germany, someone should work at a product or startup, they should know Go, Python, Java, or C, live in Berlin or Munich, work as a data scientist, software engineer, or backend developer, work at a company with 50+ people, and work their way up to some kind of management position. It also turns out that any person that knows English will be able to move to Germany and have no issue transitioning into a Germany-based company because English is widely used.

My analysis was limited by the small number of responses per city, the main programming language used at work, and gender. We know that there is a bias towards Berlin and Munich because these two cities had a much larger amount of surveyees than every other city. There is also a bias in which gender gets paid the most because there were a lot more male surveyees.

Next, I would like to use the previous years’ data and perform a time series analysis. I believe that this will allow me to have a larger pool of data to get an accurate representation of European IT professionals and their salaries, skills, and professions.